Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis

نویسندگان

  • Thomas Ewender
  • Sarah Hoffmann
  • Beat Pfister
چکیده

We present a new method for the estimation of a continuous fundamental frequency (F0) contour. The algorithm implements a global optimization and yields virtually error-free F0 contours for high quality speech signals. Such F0 contours are subsequently used to extract a continuous fundamental wave. Some local properties of this wave, together with a number of other speech features allow to classify the frames of a speech signal into five classes: voiced, unvoiced, mixed, irregularly glottalized and silence. The presented F0 detection and frame classification can be applied to F0 modeling and prosodic modification of speech segments in high-quality concatenative speech synthesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of Prosodic Modelling for Croatian Speech Synthesis

In order to include prosody into the text to speech (TTS) systems prosody knowledge needs to be acquired, represented and incorporated. Two main features of prosody important for modelling prosody for TTS systems are duration and F0 contour. There are various approaches to modelling those features and they can be categorized into three main groups: rule based, statistical and minimalistic. Some...

متن کامل

An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis

A recurrent-neural-network-based F0 model for text-to-speech (TTS) synthesis that generates F0 contours given textual features is proposed. In contrast to related F0 models, the proposed one is designed to learn the temporal correlation of F0 contours at multiple levels. The frame-level correlation is covered by feeding back the F0 output of the previous frame as the additional input of the cur...

متن کامل

An NN-based Approach to Prosodic for Synthesizing English Words Em

In this paper, a neural network-based approach to generating proper prosodic information for spelling/reading English words embedded in background Chinese texts is discussed. It expands an existing RNN-based prosodic information generator for Mandarin TTS to an RNN-MLP scheme for Mandarin-English mixed-lingual TTS. It first treats each English word as a Chinese word and uses the RNN, trained fo...

متن کامل

Simple Construction of a Frame which is $epsilon$-nearly Parseval and $epsilon$-nearly Unit Norm

In this paper, we will provide a simple method for starting with a given finite frame for an $n$-dimensional Hilbert space $mathcal{H}_n$ with nonzero elements and producing a frame which is $epsilon$-nearly Parseval and $epsilon$-nearly unit norm. Also, the concept of the $epsilon$-nearly equal frame operators for two given frames is presented. Moreover, we characterize all bounded invertible ...

متن کامل

Contours Extraction Using Line Detection and Zernike Moment

Most of the contour detection methods suffers from some drawbacks such as noise, occlusion of objects, shifting, scaling and rotation of objects in image which they suppress the recognition accuracy. To solve the problem, this paper utilizes Zernike Moment (ZM) and Pseudo Zernike Moment (PZM) to extract object contour features in all situations such as rotation, scaling and shifting of object i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009